Skip to content

Add an implementation of ParticleCopyPlan::doHandShake that uses one-sided communication from MPI-3#5227

Open
atmyers wants to merge 7 commits intoAMReX-Codes:developmentfrom
atmyers:one_sides_handshake
Open

Add an implementation of ParticleCopyPlan::doHandShake that uses one-sided communication from MPI-3#5227
atmyers wants to merge 7 commits intoAMReX-Codes:developmentfrom
atmyers:one_sides_handshake

Conversation

@atmyers
Copy link
Copy Markdown
Member

@atmyers atmyers commented Mar 25, 2026

This is one of the possible optimizations raised in Issue #4179.

This passes all tests locally and on Perlmutter. However, the one-sided version of doHandShake doesn't seem to be a performance win in the tests I ran.

The new handshake is only needed for the global redistribution path. To trigger this path, as a first test I made a particle redistribution benchmark where 1% of the particles jump to a random location in the domain. For this communication pattern, the one-sided handshake is basically the same as reduce/scatter on up to 128 nodes, then slower after that.

scaling_comparison_global

Then I thought, the above communication pattern has each rank on average sending some particles to every other rank, basically an all-to-all, which is a bad case for the one-sided version. So I did another test modeled on what we do in load balancing in WarpX. Here, there are 2 boxes per GPU, so on 1024 ranks there would be 2048 boxes. Instead of moving the particles, each step I change the distribution map randomly, so that on average each rank will send particles to 2 other ranks and receive from 2. This kind of sparse distribution pattern should be a good case for the one-sided version. However, even in this case the performance is basically the same in both cases.

scaling_comparison_regrid

None of these runs included the fix in PR #5260, which would improve the overall redistribute scaling but not effect the handshake time.

Overall I think the regime in which on-sided would be expected to win is relatively small. It would need to trigger the global redistribution path but with a sparse comm pattern. And even in the case I didn't see a win on the test I did.

However, since the one-sided method is off by default I think we should merge this as an option, since the timings are dependent on how good the RMA support is for specific MPI implementations and maybe this will work better on other systems.

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant